{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "244a7cbc-4f1d-4560-be4a-a8136190d5df",
   "metadata": {},
   "source": [
    "# Auto Merging Retriever Pack\n",
    "\n",
    "This LlamaPack provides an example of our auto-merging retriever."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e770648-429c-42a3-b221-cd8b3f247894",
   "metadata": {},
   "source": [
    "### Setup Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd5ccb0b-eae9-465c-822e-bdb20b7819b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget \"https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1\" -O paul_graham_essay.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "19b852e8-2503-4d69-949c-0a53c4d34d54",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import SimpleDirectoryReader\n",
    "\n",
    "# load in some sample data\n",
    "reader = SimpleDirectoryReader(input_files=[\"paul_graham_essay.txt\"])\n",
    "documents = reader.load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2cfa163-f792-4778-ad25-f0c4551d6ce3",
   "metadata": {},
   "source": [
    "### Download and Initialize Pack"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "978f8848-c3f9-4002-b3ff-db7db1e1e0b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llama_pack import download_llama_pack\n",
    "\n",
    "AutoMergingRetrieverPack = download_llama_pack(\n",
    "    \"AutoMergingRetrieverPack\",\n",
    "    \"./auto_merging_retriever_pack\",\n",
    "    # leave the below commented out (was for testing purposes)\n",
    "    # llama_hub_url=\"https://raw.githubusercontent.com/run-llama/llama-hub/jerry/add_llama_packs/llama_hub\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "28e5da6f-8cb4-418c-9974-bc8f8575782d",
   "metadata": {},
   "outputs": [],
   "source": [
    "auto_merging_pack = AutoMergingRetrieverPack(documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd63f606-ba3e-4de1-b02c-83669b91b411",
   "metadata": {},
   "source": [
    "### Run Pack"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "2b0f96c6-4c44-4881-a848-8d336525e685",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "> Merging 1 nodes into parent node.\n",
      "> Parent node id: 3e460bf7-6f55-4941-a7a7-72a33972fa8b.\n",
      "> Parent node text: I wanted to do something completely different, so I decided I'd paint. I wanted to see how good I...\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# this will run the full pack\n",
    "response = auto_merging_pack.run(\"What did the author do during his time in YC?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "6cec221f-19aa-4e00-88d5-51cf357d4d2f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "During his time in YC, the author worked on YC, wrote essays, and also worked on other projects. However, as YC grew and the author became more excited about it, YC started to take up more of his attention. Eventually, the author reduced his projects to two: writing essays and working on YC.\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "2f582081-8637-444f-be79-2068af96aab1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(response.source_nodes)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f2e5ba1-03d9-4185-802b-04eaf1740024",
   "metadata": {},
   "source": [
    "### Inspect Modules"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "446f4a99-5d41-472b-ab47-bddaaec01835",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'node_parser': HierarchicalNodeParser(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x12847e8c0>, chunk_sizes=[2048, 512, 128], node_parser_ids=['chunk_size_2048', 'chunk_size_512', 'chunk_size_128'], node_parser_map={'chunk_size_2048': SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x12847e8c0>, chunk_size=2048, chunk_overlap=20, separator=' ', paragraph_separator='\\n\\n\\n', secondary_chunking_regex='[^,.;。？！]+[,.;。？！]?'), 'chunk_size_512': SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x12847e8c0>, chunk_size=512, chunk_overlap=20, separator=' ', paragraph_separator='\\n\\n\\n', secondary_chunking_regex='[^,.;。？！]+[,.;。？！]?'), 'chunk_size_128': SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x12847e8c0>, chunk_size=128, chunk_overlap=20, separator=' ', paragraph_separator='\\n\\n\\n', secondary_chunking_regex='[^,.;。？！]+[,.;。？！]?')}),\n",
       " 'retriever': <llama_index.retrievers.auto_merging_retriever.AutoMergingRetriever at 0x2ba072500>,\n",
       " 'query_engine': <llama_index.query_engine.retriever_query_engine.RetrieverQueryEngine at 0x177a6bc70>}"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "modules = auto_merging_pack.get_modules()\n",
    "display(modules)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "02b4ad77-4afc-4462-ab50-6d9518ac7717",
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the node parser\n",
    "node_parser = auto_merging_pack.node_parser\n",
    "\n",
    "# get the retriever\n",
    "retriever = auto_merging_pack.retriever\n",
    "\n",
    "# get the query engine\n",
    "query_engine = auto_merging_pack.query_engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b3251a3-80fa-4c85-a917-08867a2c5e00",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_hub",
   "language": "python",
   "name": "llama_hub"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
